Normalize datetime types for unified query API#5408
Normalize datetime types for unified query API#5408dai-chen wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
PR Reviewer Guide 🔍(Review updated until commit af2b288)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to af2b288 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit cc90d7f
Suggestions up to commit cc90d7f
Suggestions up to commit 01fba06
Suggestions up to commit 70aabfe
|
70aabfe to
01fba06
Compare
|
Persistent review updated to latest commit 01fba06 |
|
All test related CI is failling on Gradle version issue: |
01fba06 to
cc90d7f
Compare
|
Persistent review updated to latest commit cc90d7f |
Will rebase once #5414 merged. |
Add postAnalysisRules (List<RelShuttle>) to LanguageSpec.LanguageExtension and register DatetimeExtension in UnifiedPplSpec with two rules: 1. DatetimeUdtNormalizeRule rewrites datetime UDT return types (EXPR_DATE/TIME/TIMESTAMP) on RexCall nodes to standard Calcite DATE/TIME(9)/TIMESTAMP(9) types via call.clone(). Precision is derived from the type system (OpenSearchTypeSystem.getMaxPrecision). 2. DatetimeOutputCastRule adds a final LogicalProject that casts standard datetime output columns to VARCHAR, aligning with PPL's wire-format contract (ISO string representation). Both rules run as postAnalysisRules after the planning strategy produces the RelNode, applied uniformly to both SQL and PPL paths. Also bumps OpenSearchTypeSystem max datetime precision from 3 to 9 (nanosecond) for TIME and TIMESTAMP types. No changes to UDF definitions or implementors in core/ — the mismatch between rewritten signatures and UDF implementations is a known limitation addressed separately. Signed-off-by: Chen Dai <[email protected]>
cc90d7f to
af2b288
Compare
|
Persistent review updated to latest commit af2b288 |
Description
Add
postAnalysisRulestoLanguageSpec.LanguageExtensionand registerDatetimeExtensioninUnifiedPplSpecwith two post-analysis rules:EXPR_DATE/TIME/TIMESTAMP) on RexCall nodes to standard Calcite types, enabling downstream consumers to process the plan without UDT-related failures.LogicalProjectthat casts standard datetime output columns toVARCHAR, aligning with PPL's wire-format contract (ISO string representation).Examples
Case 1: UDT Normalize Rule — source=events | eval d = DATE(name) | fields d Before: LogicalProject(d=[DATE($1):EXPR_DATE]) After: LogicalProject(d=[DATE($1):DATE]) Case 2: Output Cast Rule — source=events | fields hire_date, start_time, created_at Before: LogicalProject(hire_date=[$2], start_time=[$3], created_at=[$4]) LogicalTableScan(table=[[events]]) After: LogicalProject(hire_date=[CAST($0):VARCHAR], start_time=[CAST($1):VARCHAR], created_at=[CAST($2):VARCHAR]) LogicalProject(hire_date=[$2], start_time=[$3], created_at=[$4]) LogicalTableScan(table=[[events]])Notes
UnifiedQueryCompilerpath. Follow-up PR will address function implementations.CAST(datetime AS VARCHAR)whose string format is engine-dependent: PPL Calcite produce ANSI SQL format (2024-01-15 12:00:00) like most other SQL databases (SparkSQL, PostgreSQL, MySQL, Oracle, SQL Server), while DataFusion produces ISO 8601 format (2024-01-15T12:00:00).Related Issues
Part of #5250
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.